Search CORE

85 research outputs found

Generating a Linguistic Model for Requirement Quality Analysis

Author: Kang Juyeon
Park Jungyeul
Publication venue: Hankookmunhwasa
Publication date: 01/01/2016
Field of study

LELIE - An Intelligent Assistant for Improving Requirement Authoring

Author: Kang Juyeon
Saint Dizier Patrick
Publication venue: HAL CCSD
Publication date: 01/01/2015
Field of study

International audienceWhen writing or revising a set of requirements, or any technical document, it is particularly challenging to make sure that texts read easily and are unambiguous for any domain actor. Experience shows that even with several levels of proofreading and validation, most texts still contain a large number of language errors (lexical, grammatical, style, business, w.r.t. authoring recommendations), and lack of overall cohesion and coherence. LELIE [a] has been designed to track these errors and, whenever possible, to suggest corrections. LELIE has obviously an impact on the technical writer behavior: LELIE rapidly becomes an essential and user-friendly authoring companion

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

Korean Parsing Based on the Applicative Combinatory Categorial Grammar

Author: Descles Jean-Pierre
Kang Juyeon
Publication venue: De La Salle University - Dasmarinas
Publication date: 01/01/2008
Field of study

PACLIC / The University of the Philippines Visayas Cebu College Cebu City, Philippines / November 20-22, 200

Waseda University Repository

Discourse structure analysis for requirement mining

Author: Kang Juyeon
Saint Dizier Patrick
Publication venue: 'Research Institute for Knowledge Content Development & Technology'
Publication date: 01/01/2013
Field of study

International audienceIn this work, we first introduce two main approaches to writing requirements and then propose a method based on Natural Language Processing to improve requirement authoring and the overall coherence, cohesion and organization of requirement documents. We investigate the structure of requirement kernels, and then the discourse structure associated with those kernels. This will then enable the system to accurately extract requirements and their related contexts from texts (called requirement mining). Finally, we relate a first experimentation on requirement mining based on texts from seven companies. An evaluation that compares those results with manually annotated corpora of documents is given to conclude

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

Directory of Open Access Journals

Open Archive Toulouse Archive Ouverte

: Identification of fuzzy and underspecified terms in technical documents : an experiment with distributional semantics

Author: Kang Juyeon
Merdy Émilie
Tanguy Ludovic
Publication venue: HAL CCSD
Publication date: 04/07/2016
Field of study

International audienceThis study takes place in the framework of the development of linguistic resources used by an automatic verification system of technical documents like specifications. Our objective is to enlarge semi-automatically the classes of intrinsically fuzzy terms along with generic terms in order to improve the steps of identifying ambiguous elements of the system such as factors of risk. We measure and compare the efficiency of the methods of automatic distributional analysis by considering obtained results from corpora of different sizes and specialization degrees by priming from a reduced list of prime terms. We show that if a corpus of too limited size is not useable, its automatic extension by similar documents produces results that can be completed by those obtained from distributional analysis on large generic corpora.Cette étude se place dans le cadre du développement des ressources linguistiques utilisées par un système de vérification automatique de documentations techniques comme les spécifications. Notre objectif est d'étendre semi-automatiquement des classes de termes intrinsèquement flous ainsi que des termes génériques afin d'améliorer le système de détection de passages ambigus reconnus comme des facteurs de risque. Nous mesurons et comparons l'efficacité de méthodes d'analyse distributionnelle automatiques en comparant les résultats obtenus sur des corpus de taille et de degré de spécialisation variables pour une liste réduite de termes amorces. Nous montrons que si un corpus de taille trop réduite est inutilisable, son extension automatique par des documents similaires donne des résultats complémentaires à ceux que produit l'analyse distributionnelle sur de gros corpus génériques

Scientific Publications of the University of Toulouse II Le Mirail

HAL Descartes

Identification de termes flous et génériques dans la documentation technique : expérimentation avec l’analyse distributionnelle automatique

Author: Kang Juyeon
Merdy Émilie
Tanguy Ludovic
Publication venue: HAL CCSD
Publication date: 01/01/2016
Field of study

International audienceThis study takes place in the framework of the development of linguistic resources used by an automatic verification system of technical documents like specifications. Our objective is to enlarge semi-automatically the classes of intrinsically fuzzy terms along with generic terms in order to improve the steps of identifying ambiguous elements of the system such as factors of risk. We measure and compare the efficiency of the methods of automatic distributional analysis by considering obtained results from corpora of different sizes and specialization degrees by priming from a reduced list of prime terms. We show that if a corpus of too limited size is not usable, its automatic extension by similar documents produces results that can be completed by those obtained from distributional analysis on large generic corpora.Cette étude se place dans le cadre du développement des ressources linguistiques utilisées par un système de vérification automatique de documentations techniques comme les spécifications. Notre objectif est d'étendre semi-automatiquement des classes de termes intrinsèquement flous ainsi que des termes génériques afin d'améliorer le système de détection de passages ambigus reconnus comme des facteurs de risque. Nous mesurons et comparons l'efficacité de méthodes d'analyse distributionnelle automatiques en comparant les résultats obtenus sur des corpus de taille et de degré de spécialisation variables pour une liste réduite de termes amorces. Nous montrons que si un corpus de taille trop réduite est inutilisable, son extension automatique par des documents similaires donne des résultats complémentaires à ceux que produit l'analyse distributionnelle sur de gros corpus génériques

Scientific Publications of the University of Toulouse II Le Mirail

HAL Descartes

Large Language Models are Few-shot Testers: Exploring LLM-based General Bug Reproduction

Author: Kang Sungmin
Yoo Shin
Yoon Juyeon
Publication venue
Publication date: 24/07/2023
Field of study

Many automated test generation techniques have been developed to aid developers with writing tests. To facilitate full automation, most existing techniques aim to either increase coverage, or generate exploratory inputs. However, existing test generation techniques largely fall short of achieving more semantic objectives, such as generating tests to reproduce a given bug report. Reproducing bugs is nonetheless important, as our empirical study shows that the number of tests added in open source repositories due to issues was about 28% of the corresponding project test suite size. Meanwhile, due to the difficulties of transforming the expected program semantics in bug reports into test oracles, existing failure reproduction techniques tend to deal exclusively with program crashes, a small subset of all bug reports. To automate test generation from general bug reports, we propose LIBRO, a framework that uses Large Language Models (LLMs), which have been shown to be capable of performing code-related tasks. Since LLMs themselves cannot execute the target buggy code, we focus on post-processing steps that help us discern when LLMs are effective, and rank the produced tests according to their validity. Our evaluation of LIBRO shows that, on the widely studied Defects4J benchmark, LIBRO can generate failure reproducing test cases for 33% of all studied cases (251 out of 750), while suggesting a bug reproducing test in first place for 149 bugs. To mitigate data contamination, we also evaluate LIBRO against 31 bug reports submitted after the collection of the LLM training data terminated: LIBRO produces bug reproducing tests for 32% of the studied bug reports. Overall, our results show LIBRO has the potential to significantly enhance developer efficiency by automatically generating tests from bug reports.Comment: Accepted to IEEE/ACM International Conference on Software Engineering 2023 (ICSE 2023

arXiv.org e-Print Archive

Towards Autonomous Testing Agents via Conversational Large Language Models

Author: Feldt Robert
Kang Sungmin
Yoo Shin
Yoon Juyeon
Publication venue
Publication date: 08/06/2023
Field of study

Software testing is an important part of the development cycle, yet it requires specialized expertise and substantial developer effort to adequately test software. The recent discoveries of the capabilities of large language models (LLMs) suggest that they can be used as automated testing assistants, and thus provide helpful information and even drive the testing process. To highlight the potential of this technology, we present a taxonomy of LLM-based testing agents based on their level of autonomy, and describe how a greater level of autonomy can benefit developers in practice. An example use of LLMs as a testing assistant is provided to demonstrate how a conversational framework for testing can help developers. This also highlights how the often criticized hallucination of LLMs can be beneficial while testing. We identify other tangible benefits that LLM-driven testing agents can bestow, and also discuss some potential limitations

arXiv.org e-Print Archive

The GitHub Recent Bugs Dataset for Evaluating LLM-based Debugging Applications

Author: Kang Sungmin
Lee Jae Yong
Yoo Shin
Yoon Juyeon
Publication venue
Publication date: 01/11/2023
Field of study

Large Language Models (LLMs) have demonstrated strong natural language processing and code synthesis capabilities, which has led to their rapid adoption in software engineering applications. However, details about LLM training data are often not made public, which has caused concern as to whether existing bug benchmarks are included. In lieu of the training data for the popular GPT models, we examine the training data of the open-source LLM StarCoder, and find it likely that data from the widely used Defects4J benchmark was included, raising the possibility of its inclusion in GPT training data as well. This makes it difficult to tell how well LLM-based results on Defects4J would generalize, as for any results it would be unclear whether a technique's performance is due to LLM generalization or memorization. To remedy this issue and facilitate continued research on LLM-based SE, we present the GitHub Recent Bugs (GHRB) dataset, which includes 76 real-world Java bugs that were gathered after the OpenAI data cut-off point

arXiv.org e-Print Archive

A clustering approach for detecting defects in technical documents

Author: Kang Choi Juyeon
Mezghani Manel
Sèdes Florence
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

Requirements are usually “hand-written” and suffers from several problems like redundancy and inconsistency. The problems of redundancy and inconsistency between requirements or sets of requirements impact negatively the success of final products. Manually processing these issues requires too much time and it is very costly. The main contribution of this paper is the use of k-means algorithm for a redundancy and inconsistency detection in a new context, which is Requirements Engineering context. Also, we introduce a pre-processing step based on the Natural Language Processing (NLP) techniques to see the impact of this latter to the k-means results. We use Part-Of-Speech (POS) tagging and noun chunking to detect technical busi-ness terms associated to the requirements documents that we analyze. We experiment this approach on real industrial datasets. The results show the efficiency of the k-means clustering algorithm especially with the pre-processing

Open Archive Toulouse Archive Ouverte